Implementing Matrix Multiplications on the Multi-Core CPU Architectures

نویسندگان

  • Nakhoon Baek
  • Hwanyong Lee
چکیده

Recent commercial microprocessors are concentrating on the multi-core CPU architectures, while most parallel and/or distributed computing methods focus on the multi-CPU architectures. Therefore, there are needs to analyze and adapt traditional parallel algorithms for the new multi-core environments. In this paper, we use matrix multiplications as the target problem, and implemented it using various methods including the traditional serialized and parallel versions using OpenMP and Windows-threads, etc. We measure the execution times for each implementation, to finally analyze their overall performance. The most important factor for the execution time is the efficient use of level-2 caches in the CPU, according to our experimental results. We expect to develop a more efficient implementation method and design a new matrix multiplication method for the multi-core CPU’s. Key–Words: Multi-core CPU, parallel computing, performance analysis.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

"Wide or tall" and "sparse matrix dense matrix" multiplications

This note explores sparse matrix dense matrix (SMDM) multiplications, useful in block Krylov or block Lanczos methods. SMDM computations are AU , and V A, multiplication of a large sparse matrix m × n matrix A by a matrix V of k rows of length m or a matrix U of k columns of length k, k << m, k << n . In a block Lanczos or Krylov algorithm, matrix matrix multiplications with the ”tall” U and ”w...

متن کامل

Finite element assembly strategies on multi- and many-core architectures

We demonstrate that radically differing implementations of finite element methods are needed on multicore (CPU) and many-core (GPU) architectures, if their respective performance potential is to be realised. Our experimental investigations using a finite element advection-diffusion solver show that increased performance on each architecture can only be achieved by committing to specific and div...

متن کامل

Many-body quantum chemistry on graphics processing units

Heterogeneous nodes composed of a multicore CPU and at least one graphics processing unit (GPU) are increasingly common in high-performance scientific computing, and significant programming effort is currently being undertaken to port existing scientific algorithms to these unique architectures. We present implementations for two many-body quantum chemistry methods on heterogeneous nodes: the c...

متن کامل

Sparse-matrix vector multiplication on hybrid CPU+GPU platform

Sparse-matrix vector multiplication(Spmv) is a basic operation in many linear algebra kernels.So it is interesting to have a spmv on modern architectures like GPU. As it is a irregular computation CPU also performs compares to GPU. So it is interesting to have this routine in hybrid architectures like CPU+GPU.So we have designed a hybrid algorithm for Spmv which uses a CPU and a GPU. We have ex...

متن کامل

A Data-Parallel Algorithmic Modelica Extension for Efficient Execution on Multi-Core Platforms

New multi-core CPU and GPU architectures promise high computational power at a low cost if suitable computational algorithms can be developed. However, parallel programming for such architectures is usually non-portable, low-level and error-prone. To make the computational power of new multi-core architectures more easily available to Modelica modelers, we have developed the ParModelica algorit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1988